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Abstract. Asymptotic local equivalence in the sense of Le Cam is established for 
inference on the drift in multidimensional ergodic diffusions and an accompany- 
ing sequence of Gaussian shift experiments. The nonparametric local neighbour- 
hoods can be attained for any dimension, provided the regularity of the drift is 
sufficiently large. In addition, a heteroskedastic Gaussian regression experiment 
is given, which is also locally asymptotically equivalent and which does not de- 
pend on the centre of localisation. For one direction of the equivalence an explicit 
Markov kernel is constructed. 



1. Introduction 



Asymptotic equivalence is a powerful concept for analysing statistical infer- 
ence problems by a transfer to the analogous problem in a simpler statistical 
experiment. A breakthrough were the results by Brown and Low [H] and 
Nussbaum ^] who established asymptotic equivalence of the two classical 
experiments, one-dimensional Gaussian regression and density estimation, 
with an accompanying sequence of Gaussian shift experiments. In this pa- 
per we consider the statistical inference for the drift in a multidimensional 
diffusion experiment under stationarity assumptions and prove the asymp- 
totic equivalence with corresponding multidimensional Gaussian shift and 
regression experiments. 

Asymptotic equivalence results for dependent data are not very numerous, 
see Dalalyan and Reifi for an overview. Even for simple experiments, as 
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the classical ones described above, results for asymptotic equivalence in the 
multidimensional case are very scarce. We only know of the recent work by 
Carter (8| who proves asymptotic equivalence for two-dimensional Gaussian 
regression, but argues that his method fails for higher dimensions. One of 
the main reasons for the difficulties in transferring methods to higher dimen- 
sions is that piecewise constant approximations of the unknown functional 
parameter usually do not suffice anymore and higher order approximations 
have to be used, which creates unexpected problems. Brown and Zhang [H] 
remark that the two classical experiments and their accompanying Gaussian 
shift experiments are not asymptotically equivalent in the case of nonpara- 
metric classes of Holder regularity /? < d/2, where d denotes the dimension. 

The methodology we applied in ^U] to establish asymptotic equivalence for 
scalar diffusions relied heavily on the concept of local time. For multidi- 
mensional diffusions local time does not exist. This might explain why the 
statistical theory for scalar diffusions is very well developed (see Kutoyants 
|15)). while inference problems for multidimensional diffusions are more in- 
volved and much less studied. We refer to Bandi and Moloche for the 
analysis of kernel estimators for the drift vector and the diffusion matrix and 
to Ait-Sahalia ^ for a recent discussion of applications for multidimensional 
diffusion processes in econometrics. 

In Section 2 we review results for multidimensional diffusions and construct 
estimators for the invariant density and the drift vector. Interestingly, the 
estimator of the invariant density converges for d > 2 with a rate which 
is slower than parametric, but faster than in classical d-dimensional den- 
sity estimation problems. The local equivalence result of the multidimen- 
sional diffusion experiment with an accompanying Gaussian shift experi- 
ment is formulated and described in Section 3. The local neighbourhoods 
can be attained for drift functions in a nonparametric class of regularity 
P > (d— 1-t- ■y/2(d^^l)2^^) /2 for any dimension o? > 2. In Section 4 the cor- 
responding equivalence with a heteroskedastic regression experiment, which 
does not depend on the centre of localisation, is treated. This can be used 
to establish global equivalence with a single experiment, which even in the 
one-dimensional case cannot be obtained for the Gaussian shift experiment 
due to the absence of a variance stabilising transform, as was first noted by 
Delattre and Hoffmann ^1]. The explicit construction of a Markov kernel 
establishing the important part of the asymptotic equivalence is presented 
in Section 5. The proof of the main local equivalence result is deferred to 
Section 6. 
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2. Preliminaries 

2.1. Diffusion processes 

We assume that a continuous record — {Xt, < t < T} oi a, d- 
dimensional diffusion process X is observed up to time instant T. Tliis 
diffusion process is supposed to be given as a solution of the stochastic 
differential equation 

dXt^b{Xt)dt + dWf, Xo=^, te[0,T], (1) 

where 6 : R'^ ^ R'', W — {Wt, i > 0) is a d-dimensional Brownian motion 
and ^ is a random vector independent of W. We denote by 6^ : R'' ^ R, 
i = 1, . . . ,d, the components of the vector valued function b. In what follows, 
we assume that the drift is of the form b = —\/V, where V £ C^(R^) is 
referred to as potential. This restriction permits to use strong analytical 
results for the Markov semigroup of the diffusion on the -L^-space generated 
by the invariant measure. 

For positive constants Mi and M2, we define 17(1/1, M2) as the set of all 
functions b = - VV" : R"* ^ R*^ satisfying for any x, ?/ e R'' 

\b{x)\ <Mi(l + |a;|), (2) 
{b{x) - b{y)fix -y)< - (3) 

where | • | denotes the Euclidian norm in W^. Any such function b is locally 
Lipschitz-continuous. Therefore equation |^ has a unique strong solution, 
which is a homogeneous continuous Markov process, cf. Rogers and Williams 
Thm. 12.1. Set Cf, = /^^ e'^^^") du and 

^b(a;) ^C^-^e-^n-)^ 2; G Rf 

Under condition (O we have Cb < oo and the process X is ergodic 
with unique invariant probability measure (Bhattacharya |2| Thm. 3.5]). 
Moreover, the invariant probability measure of X is absolutely continu- 
ous with respect to the Lebesgue measure and its density is /it,. From now 
on, we assume that the initial value ^ in follows the invariant law 
such that the process X is strictly stationary. We denote by the law 
of this process induced on the canonical space (C([0, r];R'^), ;Bc([o^t];K'')) 
and by E;, the expectation operator with respect to this law. We write 
Mb(/) ■= Efc[/(Xo)] = / /Mb- Let P^^t be the transition semigroup of this 
process on L^(/Xb), that is 

Pb,tf{x) = ^b[f{Xt)\Xa = xl f£ L^fib) = {/ : R'' ^ R : J\f\^fib < 
The transition density is denoted by pb^t- Pb,tf{x) — J f{y)Pb,t(x, y) dy. 
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2.2. Estimators of drift and invariant density 



Some notation. We write A{p) < B{p) when A{p) is bounded by a 
constant multiple of B(j)) uniformly over the parameter values that is 
A{p) = 0{B{jj)) using the Landau symbol. Similarly, A{p) ~ B{p) means 
that A{p) < B{p) as well as B{p) < A{p). We denote by \A\ the Lebesgue 
measure and by diam(A) the diameter of a Borel set A C W^. 

For any multi-index a g N'' and a; e R'' we set \a\ = ai + . . . + and 
= x"^ • . . . • x'^'^ . Let us introduce the Holder class 

[ for any a such that \a\ = [/3J 

where [/3J is the largest integer strictly smaller than (3 and -D"/ := 



/3-1/3J 



The construction. Let us assume that the potential V lies in 7i (/3 + 1,L) 
for some (3, L > 0, which implies bi G 7i(/3, L). Furthermore, if for some 
constant Ci > we have 

max max \D°'b^{0)\ < Ci (4) 

1=1... .,d a:\a\< [/3J 

then the function fj,h is Holder continuous of order /? + 1 in any bounded set 
A c M'', that is 

\D"fib{x) ~ < L,,\x - yf-^f'i, Va e N-* : \a\ ^[f3\+l 

for all x,y G A and for some constant L^. We denote by 7i(/3, L, Ci) the set 
of all functions b such that bi E 7i(/3, L) and is fulfilled. 

A natural kernel estimator for the invariant density based on the observation 
X'^ is given by 

1 

pih,T{x) = - / Kh{x - Xt) dt, xeR. (5) 
^ Jo 

Here, Kh{x) = h^'^K{h^^x) and : M"* ^ M is a smooth kernel function of 
compact support, satisfying J K{x)dx — 1 and J K(x)x°' dx — whenever 
1 !i |q^| 5: [/3J +1. The usual bias- variance decomposition and approximation 
inequality yield (Efromovich J^jj § 8.9) 

Eb[|Ah,T(x)-Aifc(^)l'] </i'^^+'^+r-'Var[ / Kh{x^Xt)dt]. (6) 
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By analogy with the model of regression with random design, a reasonable 
estimator of b is obtained by setting 



bh,T{x) 



jl Kh{x~Xt)dXt 
T max{f},h,T{x), ^J.*{x)) 



(7) 



where /x*(a;) > is some a priori lower bound on /if, (a;), see Remark^lbelow. 
A similar risk analysis gives for i — 1 , . . . , d: 



Ei,[|6.,„,T(x) - h{x)f\ < + _L + Var 



Kh(x-Xt)h{Xt)dt 
(8) 



Asymptotic results. In order to determine the asymptotic behaviour for 
T — > CX3, we study the variance of general additive functionals of X in d 
dimensions. To do so, we assume that the semigroup Ph^t enjoys the following 
properties. 

Assumption 1 (spectral gap inequality) There exists a p > such 
that for any f e L'^{^b) and for any t > 

Assumption 2 There is a Cq > such that for any t > and for any pair 
of points x,y G M.'^, satisfying \x — < t, we have 

Remark 1. Due to Remark 4.14 in Chen and Wang ^ Assumption ^ is 
fulfilled with p = M2, whenever holds. 

Remark 2. If b fulfills (0) , then Assumption |21 can be deduced from Qian 
and Zheng [201 Thm. 3.2]. Indeed, taking in that inequality q — 1 + 1 and 
bounding the terms Q and pq respectively by Cq^^^^ and Cq, we get the 
desired inequality. If moreover b is bounded. Assumption |21 is satisfied for 
every (x, y) € R'^ and without the term f3d/2 ^-j^g right-hand side, cf. Qian 
et al. jl9l inequality (5)]. 

Proposition 1. Let r be a positive number and f : ^ R be a bounded, 
measurable function with support S satisfying diam(|iS|)'' < r'^\S\ and \S\ < 
1. Under Assumptions^ and\^ there exists a constant C depending only on 
r, d>2 and on Cq and p from Assumptions^ and\^ such that 

Varfc (^1^ f{Xt)dt^ < CT\\f\\lp,{S)\S\^Pl{\S\), 
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where ||/||oo = sup^gRd and 

i>d{x) 



max(l, (log(l/a;))2), d = 2, 



Proof. Set fc = f — l-i-b{f)- Symmetry and stationarity yield 

= 2 [ [ Et[fc{Xo)fc{X,^t)]dtds 

JQ Jo 



2 / (r-?/)Ef,[/,(Xo)/e(Xj] du 

JQ 



T 



<2T {UPt,ufc)^du. 
Jo 

Let < 6 < D < T where the specific choice of S, D is given later. Then 

(/c,Pm/c)„ < (-J + p-^e-"^)!!/!!^, < (<5 + e-''^)/^.(5)||/||L 



[0,<5]U[D.T] ■ ■ 



(9) 

follows from ||Pb,ti/c||/i6 < e '"'H/H^^i, given by Assumption^ For moderate 
values u e [S, D] we use 



{fc,Pb,ufc)t,i < {f,Pb,uf)t,i < J l/(a;)|( J Pb,u{x^y)\f{y)\dy) fj.b{x)dx. 

For S > diam(5)^ we infer from Assumption |21 

(/, Pb,uf),, < C{u-''l^ + 1 1/(2/)| dy Vu > S. (10) 

Combining © and pU|) and assuming diam(5) < 5^^'^, for d > 2 we find 



oo ■ 



Balancing the terms, we choose D ~ max(— p ^log(|5|),r^) and S = 
^2|^|2/d^ This gives the asserted estimate because we had assumed 
diam(5) < r\S\^^'^. The case d = 2 can be treated similarly. □ 

Remark 3. In the case d — 1 the bound holds with ipiix) — 1, cf. Proposition 
5.1 in Dalalyan and Reifi |l(Jj . 

Remark 4- The dimensional effect is due to the singular behaviour of 
Pb,t{x,y) for t 0. However, if the term t^'^/^ is absent in Assumption 
El then in the definition of ip2 the term (log(l/|5|))^ can be replaced by 
(log(l/|5|))-^/^. This is the case when the drift is bounded. 
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Corollary 1. If b e H{l3,L,Ci) n S{Mi,M2), the estimators given in lO 
and satisfy for h sufficiently small the following risk estimates: 

Efc [{fih.Ti^) ~ l^b{x)f] < h^^f'+'^ + T- 

M\Kt{x) - h{x)\^] < h'f' + T-^h-'' + + T-V^d(/i')- 

The rate-optimal choice h ~ h{T) ~ T^i/(2/3+<i) yields the rates 

P, r,. , , , ,,2ii/2< /r-i/2(iogr)2, d = 2, 



Proof. The risk bound for fLh,T follows from | supp(i4'/i)| ~ h"^, \\fib\\oo S 1 
and an application of Proposition ^ to the bias- variance decomposition © 
for any h sufficiently small. In the same way, we obtain the estimate for 
each bi^T,h and the rates follow by simple substitution. □ 



Remark 5. The convergence rates for the risk of p, are to be compared with 
the one-dimensional case, where the parametric rate T~^/^ is obtained, and 
with standard multivariate density estimation, where the corresponding rate 
is n"^/'^'^^^'^'^ for n observations, which is considerably larger. In contrast, 
the rate for b corresponds exactly to the classical rate n~''/'^^''+'^) in regres- 
sion or density estimation. 



Remark 6. Using conditions jSJ, ® and the equality V{x) = 1^(0) — 
Jp b{tx)^x dt, we find 

-Mi|a;| + ^Afal^p < V{x) ~ V{0) < ^Mi\x\^ + Mi\x\. 

Therefore, we can take = e-Afibl'-2Mi|a;|y j ^2Ah\y\-M2\v\^ ^ 

priori lower bound for /ifc(x). Moreover, due to assumption Q the function 
Hb is Holder continuous in As = {x ^ M."^ : infj,gyi la^ — y| < ^} for any S > 
and for any bounded set ^ C M"^'. Therefore we do not need to modify the 
kernel estimators at the boundaries of A and the inequalities of Corollary^ 
hold uniformly in b and in a; G A. 



Remark 7. Corollary ^ describes the rates of convergence of estimators for 
the local risk, that is for a pointwise loss function. To attain the local neigh- 
bourhood defined in the next section, the risk given by the sup-norm loss 
must be studied. In the classical problems of nonparametric estimation, the 
rates of convergence for the sup-norm loss on a compact set coincide up 
to a logarithmic factor with the local rates of convergence (Korostelev and 
Nussbaum Gine, Koltchinskii and Zinn JSI)- The extension from the 
pointwise to the uniform loss result is usually fairly standard, but more 
involved and lies out of the scope of this paper. 
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3. Equivalence with the Gaussian shift model 



3.1. Statement of the result 



Let IJp{L, All, M2) be the set of functions b G S{Mi, M2) such that all d 
components bi of b are in T-l{j3,L). We fix a function b° € Sp{L, Mi, M2). 
Our main result establishes a local asymptotic equivalence between diffusion 
and Gaussian shift models in the local setting, that is when the parameter 
set is a shrinking neighbourhood of b°. Be always denotes the Borel a- 
algebra of a topological space E. 



Definition 1 (diffusion experiment). Suppose U C X!{AIi, M2) for some 
Mi,M2 > 0. For any T > let K(S,T) be the statistical experiment of 
observing the diffusion defined by 0) with b £ S , that is 

nS,T) = (C([0,T];M'^),6c([o,T];R.),(PD6ei:)- 



For any function b e ^ {f ^ -^d . J |/|2^^„ < 

denote by Qb,T the Gaussian measure on {C{M.'^;M.'^), Bc(Rd.^d-j) induced by 
the d-dimensional process Z satisfying 

dZ{x) = b{x)^/nb. {x) dx + dB{x), Z(0)^0, x e M'', (11) 

where B{x) = {Bi{x), . . . , Bd{x)) and Bi{x), . . . , Bd{x) are independent 
d-variate Brownian sheets, that is zero mean Gaussian processes with 
Cov(B,(x), B,{y)) = \R^f^Ry\ where = {u : m & [0, x^]}. 



Definition 2 (Gaussian shift experiment). For 2J C L^(/if,o ; M'') an 
T > let T) be the Gaussian shift experiment with b E S , that 



For any positive numbers e, rj and for any hypercube A C R'^, we define the 
local neighbourhood of 6° 

Sib , ^, A) - |6 e S,iL, Mi,M2) . 1^^^^^ _ ^^^1 < ^^^^ (^), ^ e A 

where 1^ is the indicator function of the set A. We state the main local 
equivalence result, which will be proved in Section |B| The main ideas of 
the proof are explained in the next subsection. For the exact definition of 
statistical equivalence and the Le Cam distance A we refer to Le Cam and 
Yang ;16) . 
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Theorem 1. If St and rjx satisfy the conditions 

lim T-^e^-"^ = lim T3+'^ej.(log(re;^i))^(''=2) = lim Tr/re^ = 0, 

T — *oo T — ^oo T — >oo 

then the diffusion model is asymptotically equivalent to the Gaussian 
shift model pTJ) over the parameter set Sq^t = ^ib° , £t,Vt, A), that is 

lim sup A{E{So.T,T),¥{So.T,T)) =0. 

Let us see for which Holder regularity (3 on the drift an estimator can attain 
the local neighbourhood, that is \bh(^x),T{x) ~ ^(2^)1 < £t and \fJ.h{T),Tix) ~ 
/i(a;)| < TjT hold with a probability tending to one (cf. Nussbaum ^TH for this 
concept). By the rates obtained in Corollary ^ with a glance at Remark |7| 
and the condition in Theorem ^ this is the case if 

-/3-{2-d)f3/{2f3 + d) < 0, 
1/4 +{d- 2)/(8/3) - P/{2/3 + d) <Q, 
1 - (/3 + l)/(2/3 + d)- 2/3/(2/5 + d)<Q. 

It turns out that the second condition is most binding and all three condi- 
tions are satisfied if /? > (d — 1 + \/2{d — 1)^ — l)/2. The critical regularity 
thus grows like {l/2 + l/^/2)d for d — > oo. In dimension 2 we obtain the con- 
dition /? > 1 as in the result by Carter |H] for Gaussian regression. Whether 
for Holder classes of smaller regularity asymptotic equivalence fails, remains 
a challenging open problem. 



3.2. Method of proof 

The general idea of the proof of Theorem^consists in discretising (in space) 
the diffusion process such that the design regularisation technique we in- 
troduced in is applicable in spirit, even though the local time does not 
exist. 



Space discretisation. 

Let us denote by {vi}i. 



For any niulti- index a G N'' set al - 
-i....,K the elements of the set {v G 



ai\ ■ .. .■ arf!. 

l[x\ : v{x) = 

X" with |a| < [/3J} somehow enumerated: Vi{x) = a;"^*-*'' • . . . •x^''^*^ — a;"^*^ 
We assume that A — [—a, a^ is a hypercube and for some h > Q with 
a/h e N we denote by {am}m=i,...,M the elements of the grid {hi/'') n A. 



We introduce the subcubes Cr 



an 



+ /i[c A., m = 1, 



where is the jth coordinate of Let us define 



v(a;) 



/ vi{x)/a{l)\ 



\vK{x)/a{K)\^ 



(12) 
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which gives rise to the definition b of the Taylor approximation for b 

K 

b{x) = J2 £'"^'^^(am)v»(.T - a,n) for a; G C™, m = 1, . . . , M 

and b{x) = b°{x) for a; G M"* \ A is applied coordinate- wise). Using 

this notation, the Taylor formula can be written as 

b{x) = b{x) + (d'^^MO ~ i^"«6(a™)) X G C™, 

(13) 

where G M'' satisfies |C — am\ < | a; — ami- This implies that for V G 
H(/3 + 1, L), the estimate \b[x) - b{x)\ < holds. We write 



^{x) = 5(x) - b°{x), i}{x) = b{x) - b°{x) and 0j(x) 



^D"'^'^H.j{x) 



for i = 1, . . . , d and we shall use equivalently 9 and b for referring to the 
parameter in the local neighbourhood. The log-likelihood of the experiment 
defined via is given by (see Liptser and Shiryaev |17l p. 271, (7.62)]) 



^pT M d ^ 



(14) 



where 



f]mi[T)^ [ lc^iXt)v{Xt-a,n)dWt,j G M^', 
Jo 

Jm{T)= f IcAXtMXt - a^Xt - araf dt GR^^^, (15) 
Jo 

and Wtj denotes the jth component of Wt G M.'^. 



Design modification. Due to the ergodicity of X the law of the log- 
likelihood H14() will for large T be well approximated by 

M d y 

H [^djia^fVmj - Y Oj(a^fj^e,{a^)j (16) 

m—l j — 1 

where ri„ij ~ A/'(0, J7,n) i.i.d. and 

Jm = / v(a; - am)v{x - a,„)^^ho (a:) da::. (17) 
Jc„^ 



Equivalence for multidimensional diffusions 



11 



Since 

OjiamVJm0j{am)=^ / (b j (x) - b° {x) f fih° {x) dx , (18) 

the process (fT^ (indexed by 6) has exactly the same law as the log- 
hkehhood of the Gaussian shift 

dZ{x) ^b{x)^/J^Ax)dx + T-'^^^dB{x), Z(0) = 0, xeM.''. 

Under suitable assumptions on the smoothness of &, this last experiment is 
asymptotically equivalent to Hll|) . 

It remains to construct the random variables {rjmj) on some enlargement of 
the probability space (C([0, T]; M''), i3c([o,r];K'i), PD such that T-^/^mjiT) 
and r]mj are close as random variables. We define the stopping time 

Tm = inf {i e [0,T] : \\J^'/^Jrnit)Jr„'^^\\ >T}AT, (19) 

where the norm of a matrix A is given by \\A\\ — swp^{\Ax\/\x\). 

Let £ — {Smj)m,j be a family of independent standard normal random 
vectors in M.^ , defined on an enlarged probability space such that e and X 
are independent. We set 

By definition of the matrix jTin ~T^^J'm{Tm) is nonnegative definite and 
its square root is well defined. 



Proposition 2. Under the probability measure Pji the random vectors 
{ilmj)m,j C are independent and each rjmj is centred Gaussian with 
covariance matrix Jm ■ 



Proof. It suffices to show that for any sequence {Xmj)m.j C we have 

E exp I ^ A^y l],nj I CXp I i ^ >ZjJr,i Xmj \ , 

where the expectation is taken with respect to X following the law P^o and 
Emj being i.i.d. standard normal in R^, independent of X . 

The verification of this equality is very similar to the proof of Proposition 
2.13 in Dalalyan and Reifi JOI and is omitted. □ 
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4. Equivalence with heteroskedastic Gaussian regression 

The Gaussian experiment in Theorem ^ depends on the centre b° of the 
neighbourhood via /i^o . This fact makes the passage from the local equiv- 
alence to a global equivalence difhcult, especially, because even in the one- 
dimensional case there is no known variance stabilising transform for Hll() . 
cf. Dalalyan and Reil3 

We propose here a method of deriving an asymptotically equivalent exper- 
iment independent of b° without using the variance stabilising transform. 
The idea is to discretise the Gaussian shift experiment with a "step of dis- 
cretisation" larger than 1/T. This method has already been used in Brown 
and Zhao (2j for proving the asymptotic equivalence between regression 
models with random and deterministic designs. 

We adopt the notation from Section IT^ In addition, we introduce the K x 
if-matrix V = Jj^ v(x)v(a;)"^ dx, where v(a:) is defined by (|12l) . Since V 

is strictly positive and symmetric, the matrix V~^/^ is well defined. 

Definition 3 (heteroskedastic Gaussian regression). Let S be a subset 
o/CL^J(R'^;R''). For any T,h > we define G{S,h,T) as the experiment 
of observing 

Y — 

for i — 1, . . . ,d, TO = 1, ... , M , where {^im)i.m is a family of independent 
standard Gaussian random vectors in and b E S . 

Note that the observations in this experiment are chosen from R^*^'' accord- 
ing to a Gaussian measure. Both the mean and the variance of this measure 
depend on the parameter b such that the experiment is heteroskedastic. 

Theorem 2. // the assumptions of Theorem Q are fulfilled and h = hx 
satisfies 

lim Th"^ — lim Th^e^ = lim rj^h^'^ = 0, 

T — *oo T — >oo T — >oo 

then the diffusion experiments and the heteroskedastic Gaussian regression 
experiments are asymptotically equivalent, that is 

lim sup A{E{So^T,T),G{So,T,hT,T)) ^0. 

T^oo b<'eS^{L,Ah,M2) 



M + V-^/^ f (20) 
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Proof. Theorem Q yields the asymptotic equivalence of the experiment E 
with the (translated) Gaussian shift experiment 

dZ{x) = {b- h°){x)^/T^^Ax)dx + T-^'^dB{x), x e M.''. 
Let us introduce a new Gaussian shift: 

M 

dZ{x)= ((6-6°)(a;)v/AMa^)lc„(a;)da; + T-i/2dB(a;), x e M''. 

m— 1 

Since |V/ib(x)| and |/ib(x)| are uniformly bounded, the difference between 
the drifts of Z and Z can be estimated as follows: 

\{h - b°){x)y/ fibo (x) - {b - b°){x)\/ ^iho{am) 

< \{b -b)ix)y/ flb°iam)\ + \ {b - b°){x)[yj ^bo{x) - VMb°(am))| 

<hf^ + eh Va;eC™. 

Therefore, the Hellinger distance between the measures induced by Z and 
Z tends to zero as T ^ oo (Strasser Rem. 69.8.(2)]), provided that 
Te'^h? and Th"^^ 0. The log-likelihood of the experiment given by Z 
has exactly the same law as the log-likelihood of the Gaussian regression 



for i — 1, . . . ,d] m = 1, . . . , M, where {^im)i,m is a family of independent 
standard Gaussian random vectors in M.^ and b G S. By Lemma 3 from 
Brown et al. 0] the square of the Hellinger distance between the measures 
induced by the observations and l|?T|l . respectively, is up to a con- 
stant bounded by '^'^^i{nb{am) — fJ'b° {dm))'^ / (o-m)'^ ^ Mrj^. Because of 
Mh'^ = \A\ we infer M ~ h''^ and the condition h~'^rl^ ^ as T ^ oo 
implies that the Hellinger distance tends to zero uniformly in 6 G ^o.t- 
Finally, the desired result follows by bounding the Le Cam distance be- 
tween experiments by the supremum of the Hellinger distance between the 
corresponding measures, see e.g. Nussbaum [TBI Eq. (12)]. □ 

Remark 8. The experiment given by (|2()(l is more informative than the 
experiment generated by the observations (e^lim)^^^, where ei = 
(1,0, .^.,0)^ e M^. If we enumerate {a(z)}j so that a(l) = G W 
then := (efYim, . . . , ejYdmV satisfies = 6(a,„) -I- £,„/ ^/Th'^^b{a„l) 
with Cm/ \/ (V~i)ii ~ Af{0,ld) i.i.d. Therefore the diffusion experiment 
E{IJo^T,T) is asymptotically more informative than the regression experi- 
ment: 

Y.„ = b(a^) + =. m=l M. 



M^Y-^'^ (21) 
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If we choose hr = T"", Et = r-'3/(2/3+rf) and t]t = T^^'^+i^/f^/^+d) (j^ view 
of Corollary^, the condition of Theorem [3 takes the form 

,1 d \ 4(/3+l) 
max -; — < 2a < ^' ' 



J' 2(3 + dj d{2f3 + d)' 

Such a value a exists if and only if 



/d2 d-2 + J(d~2y+4d^ 
[i > max I — 1; — ^ 

For d ^ 2 this inequality reduces to /5 > 1. For d > 4 it is equivalent to 
(3 > — 1. Note also that the logarithmic factors in et and ijt do not 

affect this bound on the minimal regularity. 

As mentioned in the introduction, the result of Theorem |21 is new already 
in the one-dimensional case. When d — 1, using a -\/T-consistent estimator 
of lib (Kutoyants ^H], § 4.2), the local neighbourhood can be attained as 
soon as /3 > 1/2. Taking K = 1 and using the globalisation method devel- 
oped in ^U], we obtain the global asymptotic equivalence of the diffusion 
experiment and the regression 

i^m = 6(am) + —;====, m=l,...,M, 

y^Thiih(ajn) 

provided that h = = T~°' with (2/3)^^ < a < 1 and the assumptions of 
[TUl Thm. 3.5] are fulfilled. 



5. Equivalence mapping 

The result of Theorem^implies in particular that there exists a Markov ker- 
nel K from {C{[{),T]\W^),Bc([Q.T];Kd)) to {C{W^;W^),Bc{v.^.m-^)) such that 

lim sup ||Pfif-Qfa,T||Ty = 0, 

where VlK{A) = jc{[o,T]-K'i) K{x,A)Pl{dx) for A e Sc(Rd;R<i) and |1 • \\tv 
denotes the total variation norm. The aim of this section is to construct 
this Markov kernel explicitly. The construction is divided into two steps. 
First, we give the Markov kernel from the diffusion experiment to a suitable 
multivariate Gaussian regression. Then we give the Markov kernel from the 
Gaussian regression to the Gaussian shift experiment. An explicit Markov 
kernel in the other direction is not known, but seems also less useful. 

Assume that we have a path X'^ of the diffusion process ^ at our disposal. 
In what follows we use the notation introduced in Section l3. 21 with h veri- 
fying (P7|l below. For any i = 1, . . . , d we denote by Xt^i the ith coordinate 



Equivalence for multidimensional diffusions 



15 



of Xt and define the randomisation 

(X^, £) = i j^J lc„ {XMXt - am) (d^M - bUXt) dt) 

+ ^{Jrn - T-^Jm{Tm))^^^e„n, m = 1, . . . , M, 

where Jm{t)i Jm and are defined by l(T3|l . l(T7|l and and e = (£im)i,m 
is a family of independent (and independent of X^) standard Gaussian 
vectors in R^. As is easily checked, the random vector J^m^'l'llliX'^ , e) with 
£im = {TJm — Jm{Tm)Y^'^Oi{am) + Sim has the Same law as the Gaussian 
regression 

= e;{am) + {TJmy^'^e^m- (22) 

We prove in Section |0] that the total variation between the laws of e and 
£ tends to zero as T — > oo. Consequently, if we denote by K'''^\x^-) the 

law of {>/m"^^im(^' ^); i = 1, • • ■ , rf; m = 1, . . . , M}, we obtain a Markov 
kernel realising the asymptotic equivalence between the diffusion and 
the Gaussian regression (|22|l . 

For any x £ Cm and for any i G {1, . . . , d}, we define the randomisation of 
the regression (1^ by 

,(2) 



B) - / {h°{u) + w{ufY,rn)VJMu)du 

H — ^ / ^/JmAu) dBi{u) 

-A=[j yr{uff,b^iu)du^J-Ynv{u)y/JI^dB,iu)y (23) 



T ^ J R{a^,x) 

where R{am,x) = Ylf=i[ami,Xi[, B = {Bi,. . . , 13 a) and Bi,. . . ,Bd are in- 
dependent c?-variate Brownian sheets independent of (Yim)i.m- Let us show 
that ^(2) (y, B) = (^fj {y, 13); i e {I, d}, x e A) is an equivalence map- 
ping from the Gaussian regression model H22I) to the Gaussian shift model 

m- 

For any x € and for any i = 1, . . . , c? define the multivariate analogue 
of a Brownian bridge 

Vi{x) = / v(u)y^ fibo{u) dBi{u) 

J R(a^.x) 



R(ajn ^x) 



v{u)v{u)'^ ^b'iu) dujj„Y / v(m) a/ fj.bo (u) dBi {u)j 



and set 



{x)^{ I viu)v{uffibo (u) du) Y,m + T-i/V,(a;). 



R(am ,x) 
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The process Vi takes values in and can be rewritten in the form Vi (x) = 
/fl(a„,.) - b°{u))fib- (u) du + T-^'^W^{x) where 

m{x) = ( / ^{u)v{uf^lh^{u)du)J.;^^'^e,^ + V^{x). 

By construction, the process Wi is centred Gaussian with covariance matrix 
Y:^\W^{x)W^{xY\ = /^(^^^^^^^(^^ -J v(M)v(u)^^6o(u)d'u. Assuming that 
wi, . . . , vk are enumerated in such a way that v\(u) = 1, one checks that 
Bi{x) — Jj^^^ fJ-b°iu)~^^^dWi^i{u) is a d-variate Brownian sheet, where 
Wi^i is the first coordinate of Wi. Therefore, the randomisation 

^fliy.B)^ [ b°{u)^^ibo{u)du+ f Aif,"H"'/'rfV;,i(«) (24) 

satisfies 

d'^?} = Mx) Vf^b- (x) dx + T-^'^d%{x), a; e C™, i^l,...d. (25) 

The total variation between the measures induced by (|25|l and is up to 
a constant bounded by VTh^ , which tends to zero because of our choice of h 
and the assumptions of Theorem^ Moreover, the d-variate Brownian sheets 
Bi, . . . , Bd are independent. Simple algebra shows that the two definitions 
and |(53I) coincide. Hence the law K'^'^\y,-) of <?(^)(y, B) provides a 
Markov kernel from the Gaussian regression H22|l to the Gaussian shift 
realising the asymptotic equivalence. 

6. Proof of Theorem ^ 

6.1. Main part 

As we have seen in Section lT^ the construction of the Gaussian experiment 
makes use of an i.i.d. family e — {Smj)m=i,...,M, j=i,...,d of standard Gaussian 
vectors with values in R^. The canonical version of e is defined on the 
measurable space (M.^^^'^jBjg^KMd) . We prove the asymptotic equivalence 
by a suitable coupling, which consists in constructing probability measures 
and on the product space 

iS,B^) := (C([0,T],R'^) X R™, Sc([o.t],b^) ^ ^r-*") 

such that 

a) E{Ua,T,T) is equivalent to E{Uo,t,T) = {<^,Bs, {PDbeEo.r), 

b) E{Uo^T,T) and ¥{So,t,T) = (^, S^, (Q^fce^o.^) are asymptotically 
equivalent, 

c) F(l^o,T,r) is asymptotically equivalent to F{IJo,t,T). 
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a) Define to be the measure induced by the pair {X'^,£), where X'^ 
is given by and e is a standard Gaussian vector independent of X'^ , 
that is = P^ (g) J\f KMd with J\fk denoting the standard normal law on 
M*^. Then the equivalence E ~ E follows from the equality in law of the 
respective likelihood processes, cf. Strasser Cor. 25.9]. 



b) The measure is defined via 



JAxB 



(de) 



for AeB, 



C([0,T],. 



lAxB 

and B e BaKUd with 



hi d 

MX-,e)^Y.E 

m—l j — 1 



and 



rijnjiX'^.e) = 



T 



lc„XXtMXt - am) {dXt,j - b°{Xt) dt) 



Because of fb''{X'^,e) — these definitions yield Q^o = P^o and therefore 
log {X'^,e)) — fb{X'^,e). Proposition 121 combined with the classical 

formula of the characteristic function of a Gaussian vector implies that 
is a probability measure. 

To prove the asymptotic equivalence of E and F, it suffices to show that 
the KuUback-Leibler divergence between the measures P^ and tends to 
zero uniformly in 6 £ -S'o.t (see the proof of Thm. 2.16 in ^Ol)- The Fubini 
theorem yields 



= Eb 



log 

log 



dFl 



{X^,s))PlidX^)ArKMd{de) 



dFl 



iX^)) - / MX^,e)ArKMdide) 



The Girsanov formula (Liptser and Shiryaev ^7]) and the fact that the 
expectation of the stochastic integral is zero give 



E. 



dPf 



mxt)\^dt 



log( 



T 
2 



T 
2 



f^b°{Xo) 

\d{x)\^Hb{x)dx + T 



— I ^ib{x) dx 
^{xf {^(x) - '3{x)) Hb{x) dx. 
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Similarly, we find 

M d 



•' m=l i=l 

+ Efc [Ojia^f ^ He™ (Xt)v(Xt - a™) dj [Xt) dt] ) 

m=l ^-^0 

Using for /(x) = |t?(2:)p and f{x) = d{x)'^ (d{x) — d{x)) the generalidentity 

r y F 1 

T / /(x)Mb(a;)da:= y eJ / tc,^{Xt) f{Xt) dt , 

we obtain KL(P^, ) = Yl^-^i %W with 

Ti(i?) = Eh[logfib{Xo)-\ogfibo{Xo)], 

= JJ^{x)\^^it^{x)-^lt{x))dx, 

%{^) = y eJ / |,?(xoi']ic,„(xodt , 

m=l 
m=l 

The Cauchy-Schwarz inequality implies that < +%{§). The 

explicit form of the invariant density fih implies that sup^7i(i9) < e. The 
Holder assumption implies that sup^ Wi^) ~ ^ 8'iid we infer 

supr2(i9) < r(/i2'3 + e2)r/, supr4(i9) < T/i2/3. 

In Section below we prove that 

%{d)<{TT^ + Mh'')Vf)mi, (26) 
holds if h — Ht tends to zero for T — > 00. Hence, we obtain 

KLiPl, < £ + Th'^ + Tie' + h'^)v + Mh'')Vf{e^ + h^P). 
Consequently, the rate-optimal choice of h is 

/i^/ij,^ (£4^-1)1/(4/3+^-2)^ (27) 
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provided that /i^'^ = o(e^), so that 

given Under the assumptions of the theorem we thus conclude 

that E and F are asymptotically equivalent. 

c) It remains to verify that the statistical experiment F defined via is 
asymptotically equivalent to the experiment F defined via Q^. We have 
already seen that 




Recall that according to Proposition [21 the random vectors (?7mj)m.j are 
independent Gaussian with covariance matrix J7m- Therefore, the law of 
the log-likehhood process {dQ,-!^ / dQ'^o) ^^^^^ coincides with the law of the 

process {dQ.^ /dQ'^^')^^^^. This gives the equivalence of the experiments F 

and F, where the latter experiment is defined by the observation 

dZ{x) ^b{x)^/JI^Mx)dx + T-^/^dB{x), Z(0) = 0, xeR"^. (28) 

To conclude, we remark that the KuUback-Leibler divergence between the 
Gaussian experiments F and F is bounded by T J^d(b — b)^l^b° < Th^ and 
in view of (|27|l tends to zero for T ^ oo. □ 



6.2. Evaluation of 



We start by sketching how the estimate could be reduced to a purely ana- 
lytical problem, using 



Tm<\\b- b4lo T.Mf (^0 dt 
< ||6-5o||L(supEfc[r-r„] + ^(E 

^ m, ^ 



(29) 



ilc,jXt)^PbiC^))dt 



If / is a function in the domain of the generator Lh of the semigroup {Pb,t)t>o 
with Lbf = lc^{Xt) — Pf,(C„i), then Dynkin's formula and the fact that 
lc„^{Xt) — Pfc(C„) is centred yield 



(lc„(Xt)-Ph(C,„))dt =-Eb[f{XrJ]< sup fix) 



Unfortunately, a suitably tight supremum norm estimate for f — L^^ ^ (lc„ 
Ph(Cm)) could not be found in the literature. 
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We therefore proceed differently and make use of the mixing properties of 
X. Fix some A = A{T) > 0. Since for t,„ > T - Z\ the integral over [t„i, T] 
is smaller than the integral over [T — A,T], we have 



< Aiib{C„ 



{t„,<t-a} I l{Xtec,„} 



dt 



(30) 



Lemma 1. Under the assumptions of Proposition^ we obtain 



Eft 



{t„<T-z1} 



l{x,ec,„}dt 



< 



Proof. Because of [r^, + Z\] C [(i - 1)A, {i + 1)A] for some 1 < i < T/A 
we get 



i-^s) ds < max 



j=l,....[T//l] 



Ic JX,)ds. 



Set Ui — jl^i^i-l^ Ic™ (-'^s) ds — 2A^b{Crn)- By separating the bias from the 
stochastic term, we find 



lc^{X,) ds < 2 Afib{C„,)+ max \Ui\, 

i=l,...,lT/A] 



and by the Cauchy-Schwarz inequality 

, IT/Al 



Eb[max|C/,|] < ( ^ Efc([/f))' = [T/A\^/^YaT ^ ]lc„(X,)ds 
We conclude by an apphcation of Proposition Q □ 



Lemma 2. If Assumption^ is satisfied, then 



E, 



1{t™<T-/1} 



1 



dt 



T~A 



<M6(C,„) / Pt{Tm<t)dt 



+ re~^VAifc(C™). 



Proof. We have 



Eft 



<T-/i} 



= Efc 

•T 



T 



Mb(C„0 / P^(r™ <t-A)dt 

J A 

+ / Eb[(lc,„(Xt) - Mfc(a„)) l{r,„<t-4}] 
J A 
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Using the Markov property of the process (Xt) and the spectral gap in- 
equahty from Assumption ^ we infer that 

= Ef,[Pb,A(lc„ - flb{C^)){Xt-A) l{r,„<t-Zi}] 

= l|n,4lc„ - Mfc(C„Ot, < e-^VMb(C„0. 
This inequality completes the proof of the lemma. □ 

Lemma 3. We have uniformly over m ~ 1, • • . , M : 

Pb{Trn <t}< (jTItf ■ 

Proof. Note that Mt := Jm^^'^flmj{i) G is a martingale with quadratic 
variation matrix {M)t = Jm^^"^ Jm{t)Jm^^'^ ■ We obtain that Eb[(Af)f] = 
Hk with the K x if -unit matrix Ik and 

Pfe(T™ <<) =P6(||(M)t|| >r) = Pfc(||(M)t-t/A'|| >T-t) 
Eb[||(M)t-^/g|p] 

(r-i)2 

Let Jh e R^""^ be the diagonal matrix with A,,^ = i = 1, . . . ,ff, 

then 

\\{M)t - UkW = \\J-y'{Jm{t) - tJMn'^'W 

< \\Jny^Jhf\\Jk\jrnit)-tJ^)J^'\\. 

Simple algebra shows that || J,n^^^ AlP \\{Jh^ Jm-Jh^y^W, J^^ = Jh-^ 
and 

Jh^JmJh^ ^ h'^ v(u)v(u)'^//f,o (am + uh) du. 

"'[0,1]'' 

This matrix is strictly positive definite and \\h^'^^b° {o-m)^^ J^^ J^mJ^^ ~ ^11 
tends to zero as /i — *■ 0. Hence, by the continuity of the matrix inversion we 
obtain for h small enough 

||/lV(«m)^/.J'm'A|| < 2|1V-1||. 

We conclude that WJm^^^JhW^ < Affc=(C,„)-i. Set now Ht = Jh^{J,n{t) - 
tJm)Jh^ ■ It is easily checked that 
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Each entry Ht.ij can be written as /g f{Xs) ds — t J^^ /(x)/ifco (x) dx, where 
/ is a function bounded by 1 and supported by Cm- Thus, a bias- variance 
decomposition combined with Proposition ^yields 



fibix) - (x)| dx ] + th'^ijj{h'^)fib{C„^) 



Since in view of Remark El /ib(C„i) and fib°{Cm) are both of order h"^ and 
all norms in R^'^^ are equivalent, we arrive at the desired estimate. □ 

Using the last lemma we obtain 



< / min 1, 



Setting CT = T-'^/'^i^dih'^), we get 

mmll,— \2 I / niin(l, C7^(l — wjw ) dv 



{T-ty 

pct poo 

<T Idv + T c^y-^ dv 

Jo J CT 

= 2TcT = 2T^/'^^M{h'^)- 

In the same way we obtain jj" min (l, t^if' / (T — t)^) dt < 2Tri. Substituting 
all estimates into (|30|l and (|29|l . we obtain 

Md) < \\b-b°\\l^{A + Tr, + Mh'')VT + Th-''^^er^P). 

Thus choosing A{T) = 'ilJdih'^)VT we get 

%i^)<\\b-m{Trj + Mh')VT), 

provided that h — h(T) tends to zero as T — > oo. 
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